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A Wavelet-Based Algorithm for the Spatial Analysis of Poisson Data 
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ABSTRACT 

Wavelets are scaleable, oscillatory functions that deviate from zero only within a 
limited spatial regime and have average value zero, and thus may be used to simul- 
taneously characterize the shape, location, and strength of astronomical sources. But 
in addition to their use as source characterizers, wavelet functions are rapidly gaining 
currency within the source detection field. Wavelet-based source detection involves the 
correlation of scaled wavelet functions with binned, two-dimensional image data. If the 
chosen wavelet function exhibits the property of vanishing moments, significantly non- 
zero correlation coefficients will be observed only where there are high-order variations 
in the data; e.g., they will be observed in the vicinity of sources. Source pixels are identi- 
fied by comparing each correlation coefficient with its probability sampling distribution, 
which is a function of the (estimated or a priori-known) background amplitude. 

In this paper, we describe the mission- independent, wavelet-based source detection 
algorithm WAVDETECT, part of the freely available Chandra Interactive Analysis of Ob- 
servations (CI AO) software package. Our algorithm uses the Marr, or "Mexican Hat" 
wavelet function, but may be adapted for use with other wavelet functions. Aspects 
of our algorithm include: (1) the computation of local, exposure-corrected normalized 
(i.e. flat-fielded) background maps; (2) the correction for exposure variations within 
the field-of-view (due to, e.g., telescope support ribs or the edge of the field); (3) its 
applicability within the low-counts regime, as it does not require a minimum num- 
ber of background counts per pixel for the accurate computation of source detection 
thresholds; (4) the generation of a source list in a manner that does not depend upon 
a detailed knowledge of the point spread function (PSF) shape; and (5) error analysis. 
These features make our algorithm considerably more general than previous methods 
developed for the analysis of X-ray image data, especially in the low count regime. We 
demonstrate the robustness of WAVDETECT by applying it to an image from an idealized 
detector with a spatially invariant Gaussian PSF and an exposure map similar to that 
of the Einstein IPC; to Pleiades Cluster data collected by the ROSAT PSPC; and to 
simulated Chandra ACIS-I image of the Lockman Hole region. 

Subject headings: methods: data analysis — techniques: image processing — X-rays: 
general 
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1. Introduction 

The detection and characterization of astronomical sources becomes increasingly difficult as 
we attempt to observe these sources in the EUV, X-ray, and gamma-ray spectral regimes. There 
are several reasons for this. First, in these high-energy regimes, source data may consist of only 
a few counts, so that wc must rely on the Poisson distribution when making statistical inferences 
rather than using Gaussian statistics that are considerably easier to apply, but that are strictly 
applicable only in the high-counts limit. Second, spatially extended sources, such as supernova 
remnants and galaxy clusters, exhibit bright diffuse emission at high energies which may overlap 
with point sources, rendering the latter more difficult both to detect and characterize. And third, 
the present generation of broad-band high-energy telescopes, unlike optical telescopes, have spatially 
non-uniform point spread functions (PSFs) as an unavoidable by-product of their design. For 
instance, the PSF of the Wolter I-type High Resolution Mirror Assembly (HRMA) on the Chandra 
X-ray Observatory (CXO) has a 50% encircled energy radius that varies in width from 0.3 
on-axis to ^ 10 near the outer edges of an ACIS-I chip (~ 10 off-axis). 

A standard method for the analysis of Poisson count data involves the application of the so- 
called "sliding cell" (Harndcn et al. 1984).'^ In sliding-ccll analysis, two co-aligncd but differently 
sized square cells are placed at each image pixel, with the number of counts in the annular region 
between cells providing an estimate of the local background amplitude at the pixel. This amplitude 
is then used to compute the Poisson significance of the observed number of counts in the inner 
cell. The usefulness of a sliding-cell algorithm is limited both when the field-of-view (FOV) is 
crowded, since overlapping sources cannot be handled and/or nearby sources contributing counts 
to the estimated background may decrease detection sensitivity, and when sources are observed 
off-axis, since uncertainties in the model of the PSF, which generally increase with off-axis angle, 
can greatly affect source property estimates (see, e.g., Kashyap et al. 1994). The sliding cell also 
may not provide accurate source property estimates for extended sources. Source characterization 
can be improved by fitting the detected sources using maximum likelihood methods (as in the 
algorithm of Hasinger, Johnston, &: Verbunt 1994), but the accuracy of this method is still limited 
by uncertainties in the PSF. 

Within the last decade, astronomers have begun to apply wavelet functions to the problem of 
source detection. (For an introduction to the theory of wavelet functions, see, e.g., Mallat 1998 and 
references therein.) These functions are scaleable, oscillatory, have a finite support (i.e., are non- 
zero within a limited spatial regime) , and have an average value of zero; they can be used to define 
a set of basis functions that act as highly localized filters in both the spatial and frequency domains, 
and thus are superior source characterizers. Certain wavelet functions also exhibit the property of 
vanishing moments, which is important for source detection: the integral of the product of a wavelet 
with A'^ vanishing moments and a polynomial of degree < A — 1 is zero. Thus the correlation of 
a suitably chosen wavelet function with a photon counts image will yield correlation coefficients 
which are significantly large only in the vicinity of localized high-order variations in the data, e.g. in 
the vicinity of astronomical sources, which appear as PSF-broadened bumps with infinite power- 



^The method used, e.g., in the CIAO source detection routine CELLDETECT. 



-3- 



series representation. Wavelet-based source detection thus boils down to the statistical problem of 
identifying image pixels with "significantly large" correlation coefficients. 

Damiani et al. (1997a) were the first to present a wavelet-based generalized method for source 
detection and characterization, which they apply to ROSAT PSPC data in a subsequent work 
(Damiani et al. 1997b). Among its features, this method, unlike the others, uses exposure maps 
to handle situations in which, e.g., support rib shadows or the edge of the FOV lie within the 
wavelet support, allowing the analysis of the entire FOV. Their algorithm is, however, instrument- 
dependent in that they must make significant changes to it in order to account for qualitative 
differences between exposure maps of different detectors (e.g. between the exposure maps of the 
ROSAT PSPC and ROSAT RRl; Micela et al. 1999, provide a short description of the changes that 
are necessary to perform HRI image analysis). In addition, while one can apply the Damiani et 
al. method to data from other photon-counting detectors, the PSFs must be very nearly Gaussian. 
Last, they publish a function for the computation of source detection thresholds which is strictly 
valid only when the mean background amplitude per image pixel is ^ O-i cts^pix — ^ -vv^here a is the 
wavelet scale size. 

In this paper (and in Freeman et al. 2001) we describe our own algorithm for wavelet source 
detection and characterization that has been developed for a generic detector and implemented 

in the Chandra Interactive Analysis of Observations (CI AO) routine WAVDETECT.^ In subsequent 
papers (e.g. Kashyap et al. 2001, in preparation) we will discuss the application of this algorithm 
to specific scientific problems. Our algorithm is considerably more general and flexible than others 
that have been developed, in that it can: (1) operate effectively in the low background counts 
regime, which is crucial because of the expected low particle and cosmic background count rates 
for the Chandra detectors (the overall rate being ~ 10~^ and 10~^ ct sec~^ pix~^ for the Chandra 
ACIS and HRC detectors, respectively); and (2) operate effectively regardless of the PSF shape, 
also crucial because of the (non-Gaussian) nature of the off-axis Chandra PSFs.^ It also (3) corrects 
for the effect of exposure variations in a general, non-detector-specific manner. Thus our algorithm 
may be immediately adapted for the analysis of data from virtually any other photon-counting 
detector. 

In §2, we provide a brief description of wavelet functions, and define the Marr, or "Mexican 
Hat" (MH) wavelet function which we use in our algorithm.^ We then present a simple example in 
which we apply the MH function to idealized data, in order to build the reader's intuition about how 
to interpret the results. The MH function has been used often in astronomical wavelet analyses,^ 



''The CIAO software package may downloaded from http://asc.hcirvard.edu/ciao/. WAVDETECT is composed of 
WTRANSFORM, a source detector, and WRECON, a source list generator; these programs may be run separately. 

^While it can operate to a limited extent if nothing at all is known about the PSF, our algorithm is most effective 
if characteristic PSF sizes, e.g. the radii of circles containing 50% of the encircled energy for different off-axis angles, 
are computable. 

®We note that while our algorithm uses the MH function exclusively, one can adapt our algorithm to work with 
other wavelet functions. 

'^However, we must note that the Mexican Hat is not the only wavelet function used by astronomers; for instance, 
Rosati et al. (1995) and Vikhlinin et al. (1998) use the Morlet wavelet function to detect and analyze X-ray clusters in 
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in for example: the analysis of ^^CO spectral maps of the molecular cloud L1551 (Gill & Hemikson 

1990; see also Langer, Wilson, & Anderson 1993, who apply Laplacian pyramid transforms to a 
""^^CO image map of Barnard 5); the analysis of galaxy cluster structure in optically derived catalogs 
(Slezak, Bijaoui, & Mars 1990); the detection and localization of features in optical CCD images 
of galaxies (Coupinot et al. 1992); the analysis of substructures observed in ROS AT PSPC images 
of the Coma Cluster (Vikhlinin, Forman, & Jones 1994); the examination of Einstein HRI and 
ROS AT PSPC images of Abcll 1367 (Grcbcncv ct al. 1995); the detection of serendipitous X-ray 
clusters in archival ROSAT FSFC data (SHARC; cf. Ulmer et al. 1995, Freeman et al. 1996, Nichol 
et al. 1997); and the analysis and modeling of X-ray emission in Abell clusters (Lazzati &; Chincarini 
1998; see also Lazzati et al. 1998) and stellar clusters (Damiani et al. 1997b). 

In §3 we describe the basic steps of our algorithm, in which sources are detected and charac- 
terized. The basic steps of source detection include: the correlation of the wavelet function with the 
data to create the correlation image; the estimation of the local background (if necessary) in each 
pixel; the computation of the source detection threshold in each pixel; accounting for the effects of 
exposure variations within the FOV on the correlation value; and the identification and "cleans- 
ing" of source counts from the image. Source count cleansing is done iteratively, first by analyzing 
the raw data, then the first version of the cleansed data, etc., until the background is estimated. 
This final background is used to compute source detection thresholds, which are subsequently used 
to identify putative sources. Source characterization involves the combination of outputs from a 
number of wavelet scales, and the creation of source cells on the initial image within which source 
properties (positions, count rates, etc.) are estimated. 

In §4 we demonstrate the efficacy of the algorithm by applying it to a large variety of cases, such 
as: an idealized image containing simulated extended and point sources, with spatially invariant 
Gaussian PSF and an exposure map similar to that of the Einstein IPC; a 32-ksec ROSAT PSFC 
observation of the Pleiades cluster (cf. Micela et al. 1996); and a simulated 30-ksec Chandra ACIS-I 
observation of the Lockman Hole region (T. Gaetz, private communication). We then describe the 
differences between our method and previously published methods, especially that of Damiani et 
al., in §5, and provide our summary and conclusions in §6. In future works, our algorithm will be 
applied to the analysis of specific scientific problems, e.g. observed spatial variations in the diffuse 
X-ray background of the Pleiades Cluster (see Kashyap et al. 1996 and Kashyap et al. 2001, in 
preparation) . 



ROSAT PSPC images, while Slezak, Durret, & Gerbal (1994), Biviano et al. (1996), and Pierre & Starck (1998) use 
the so-called "B-splines" in their cluster analyses. Also, there is the a trous algorithm of Starck, Murtagh, & Bijaoui 
(1995), Starck & Murtagh (1998), and Starck & Pierre (1998). Methods based on this algorithm are similar to the 
one which we describe in this paper; however, because their use is limited to particular problems, these methods are 
not sufficiently generalized to be applied to the fuU-FOV data of an arbitrary detector. For instance, these methods 
generally do not take into account exposure variations within the FOV, or are applied only in limited regions not 
greatly affected by vignetting, and they also generally ignore local variations in the background. 
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2. Wavelets and Source Detection 

2.1. Wavelet Properties 

Wavelets can be used to filter an image at a given length-scale.^ This can be seen by considering 
the difference between two smoothed functions: one formed by convolving an (arbitrary) function 
/ with a real-valued, non-negative, infinitely diffcrcntiablc smoothing function G whose size 
is characterized by a scale a and satisfies the condition j ip = 1 (e.g., the Gaussian function), 
and one formed by the convolution of / with the same smoothing function with scale size a + da 
(Holschneider 1995). Because all structure at length-scales smaller than the smoothing scales would 
be suppressed, the difference of these two smoothed functions will provide information about the 
details of / that are introduced at the scale a itself. In the situation of interest to us, the analysis 
of two-dimensional images, the relationship between (p and the wavelet function W' is, in the limits 
dcx, day — > 0, 

n(f.f) = -(..^+%^)«f f). (1) 

ax ay \ oax day) ax ay 

(The reason for our use of a prime symbol will become apparent below.) This is an analyzing 
wavelet, or mother wavelet (hence the subscript ra). Other members of the same wavelet family 
(so-called "atoms") can be generated from this mother wavelet via dilations and translations: 

W\a,h-ax.ay;x,y) ^ ^< f^,^") . (2) 



ax and ay are the dilation parameters, and a and h are the translation parameters. 

There are many functions ^ that can be used to create a mother wavelet. One example is the 
two-dimensional Gaussian function: 

ax ay 2iTaxay \ 2a^ 2a^J 



from which is created the Marr, or "Mexican Hat" (MH), wavelet function: 

at'i '^'^y (4) 



m v J / o 

ax ay Ziraxay 



1 2 n -"-^ 

2 2 



^ W^.n(-,-). (5) 



I'Kaxa. 



y '-'X 

This is the wavelet function we use in our source detection algorithm. (Note that we use the 
function Wm instead of W^, to be consistent with Damiani et al. 1997a.) The MH function has 
a positive kernel with shape similar to a canonical PSF {PW), surrounded by a negative annulus 
{NW; Figure 1). Ellipses with semi-axes of length ■\/2ax-y, 2ax;y, and ^ax;y describe the boundary 
between PW and A^l^, the minimum of the MH function, and the effective support of the MH 
function, respectively. 



For a general introduction to wavelets, see, e.g., Mallat 1998 and references therein, and Daubechies 1992. 
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The MH function offers several advantages which motivate its use for source detection: (1) 

it has two vanishing moments (i.e. the correlation of the MH function with constant or linear 
functions is zero), so that it acts to suppress the contribution of the (generally spatially constant) 
background to the correlation coefficients; (2) while it docs not have compact support, and cannot 
be used to construct a set of orthogonal basis functions, a dyadic sequence of MH functions (i.e., with 
MH functions with scales separated by factors of two) is sufficient to sample the entire frequency 
domain, because of the limited extent of an MH function's Fourier transform; (3) its limited extent 
in both spatial and Fourier domains helps to minimize effects of aliasing; and (4) it is analytically 
manipulable, so that many numerical operations may be performed using analytically derivable 
functions (see Appendix A), which can reduce computation time significantly. 



Before describing our algorithm in detail, we present a simplified example to help build the 
reader's intuition. We assume that we are analyzing a subset of an evenly exposed image with the 
MH wavelet function, far from any edge of the FOV. Within this image, the counts in each pixel, 

Dij, are sampled from a function which has a constant, relatively large (^ 1) amplitude, which 
we denote B. These assumptions allow us to build intuition with a minimum of unnecessary detail 
(such as correcting for exposure variations, etc.), and should not be construed as being reflective 
of limitations on the applicability of our algorithm. 

As will be described in §3, the first step in our algorithm is to correlate the wavelet function 
W and the binned image data D. We denote correlation using the symbol < >; in this case, 
we would write C = < W~kD >, or for a particular pixel, Cij = < W-kD >i,j-^ Because the average 
value of the wavelet is zero and the data are sampled from a constant amplitude function, the 
mean correlation value will tend asymptotically to zero, with statistical sampling causing positively 
and negatively valued deviations from zero in individual pixels. Indeed, for our simple situation, 
the resulting probability sampling distribution (PSD) of correlation values, denoted p(C\B), will 
tend asymptotically to a zero-mean Gaussian, with width (Tg,c ■sJa'^OyB (Freeman et al. 1996, 
Damiani et al. 1996, Damiani et al. 1997a). 

We now assume that there is a tightly bunched clump of counts in our otherwise source-free 
subset image, which could be caused by a Poisson sampling fluctuation or by an astronomical 
source. We assume the clump has Gaussian shape with amplitude Aq and width ctg, which is 
comparable to the size of the PSF of the instrument. The correlation of the MH function with a 
Gaussian yields a MH function that has its centroid at the Gaussian centroid, and has centroid 
amplitude C-a^aApxi^^y) proportional to Aq: 



2.2. A Simple Example 



2 - 



2 2 
^G 



(6) 



max 





This notation deviates somewhat from that of Mallat (1998), in which Ci,j would be written j]; however, 

we feel our notation makes complicated expressions in the remainder of this work more easily interpretable. 
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For our simple example, Cmax > 0, i.e. Cmax is always greater than the mean of the PSD. To 

determine whether we should associated the clump with an astronomical source, wc would compute 
the integral of the PSD from Cmax to oo; if this quantity is smaller than a predefined significance 
threshold (e.g. < 10~^, or Cmax > 4.75(Tg'^c')) then we associate the clump with a source. 

Because cg^c o'xO'yi source detectability will vary as a function of wavelet scale size. In 
the limits (crx,cry) and oo, the ratio £2i;i^^£jil goes to zero, i.e. the source becomes 
undetectable. If we apply symmetric wavelets in our analysis, the maximum value of this ratio 
occurs at = Oy = \f^aG- Hence sources are most easily detected when one analyzes the image 
with a wavelet function that has a size similar to that of the source. Since such a function acts to 
"filter," or selectively enhance, structures of similar scale, this behavior is to be expected. 

If the clump is associated with a previously known source, but is not detected, we can use eq. (6) 
to determine the upper limit on source counts, by substituting the source detection threshold for 
Cmax 5 and solving for Aq (sec, e.g., §4.1). (Because the Gaussian is normalized, Aq is identically 
the number of source counts.) We stress that the use of this method to place upper limits on source 
counts is limited to cases where the PSF shape is that of a two-dimensional Gaussian function, and 
should not be used in the place of simulations if the PSF has arbitrary shape. 

3. Algorithm 

3.1. Source Detection 

Our first objective is source detection: the identification of putative source pixels in binned, 
two-dimensional image data. This identification is normally done by carrying out the steps in 
the algorithm described below separately with each of a number of wavelet functions (see also 
Figure 2). The basic steps in this algorithm include (1) the correlation of the data with the given 
wavelet function (§3.1.1), and (2) the identification of image pixels with correlation values larger 
than pre-defined thresholds for source detection (§3.1.2). The second step requires knowledge of 
the local background in each pixel (due to unresolved point sources, diffuse astrophysical emission, 
particle background, etc.). If the background has not been determined previously, then it must be 
estimated, e.g. via the method described in §3.1.3. Also, instrumental artifacts such as support rib 
structures, hot spots, and the edge of the FOV can adversely affect the second step, resulting in the 
detection of instrumental features in the image, which are astrophysically uninteresting. In §3.1.4, 
we describe a method for rejecting such "instrumental sources." Finally, in §3.1.5 we describe how 
the variances of the correlation and local background amplitudes are estimated. 
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3.1.1. Correlation of the Wavelet Function and the Data 

The first step in the source detection process at a given scale is to compute the correlation 
of the wavelet function W{ax,(Ty]x,y) with the binned, two-dimensional image data D.^^ The 
translation parameters a and 6 shown in eq. (2) correspond to image pixel indices i and j; for a 
given pixel, the correlation value is 

Ci,i = ^ ^ Wi-i' j-j' A'j' (7) 

v r 

= < Wi<D >ij . (8) 

(Henceforth, if a quantity is given with subscripts, we are referring to its value at pixel {i,j); 
otherwise, we are referring to the quantity's array of values.) The interested reader may find 
details about how < W-kD > is computed in Appendix A. 



3.1.2. Computation of the Source Detection Thresholds 

To determine whether pixel {i,j) should be associated with a source, we compute the proba- 
bility of observing the correlation value Cij if there are only background counts present within the 
support of W: 

poo 

Sij = / dCp{C\B,j). (9) 

Sij is dubbed the significance (or the Type I error; see, e.g., Eadie et al. 1971, pp. 215-216), and it 
is the estimated probability that we would erroneously identify pixel {i,j) with a source. In §2.2, we 
indicated how Sij could be determined analytically in the high-counts limit, where p{C\Bij) tends 
asymptotically to a zero-mean Gaussian of width y^2TTax(7yBij. In general, however, p{C\Bij) must 
be determined via simulations (see Appendix B). Because the number of simulations we carried 
out is only sufficient to directly determine significances Sij ^ 10^^, and because strong sources 
may have much greater significances (in a qualitative sense) , we instead compute a source detection 
threshold Co,i,j{So, Bij) via the equation 

/•oo 

So = dCpiC\Bi,j), (10) 

where So is the user-specified threshold significance.^^ If Cij > Co,ij, we associate the pixel (i,j) 
with a source. 

^"in this section, we do not specifically refer to the Mexican Hat wavelet to underscore the fact that our algorithm 
may be adapted for use with other wavelet functions. 

^^Onc choice is So ~ P~^, where P is the number of analyzed pixels in the image; with this choice, the average 
number of false detections is one per image. 
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3.1.3. Background Estimation 

In this section, we describe how the local background counts amplitude Bij is estimated if 
it is unknown a priori (see also Figure 3). While there is no unique way to make this estimate, 
we seek a method that does not depend on a detector's PSF, both for increased generality and 
computational speed, and also because we want our algorithm to be able to detect and analyze 
sources of arbitrary size, not just point sources. We can fulfill this condition by creating background 
maps at each wavelet scale,^^ using a localized function that is wavelet-scale, and not PSF-scale, 
dependent: the wavelet negative annulus NW{ax, o'y) (i.e., W with positive values reset to zero). 
While the function NW is related to a wavelet function, we stress that it itself is not a wavelet 
function, and that its use in background estimation does not constitute a transform. 

If the exposure within the support of NW is constant, and if ax and ay are sufficiently large 
such that the integral of the source counts distribution (i.e. the PSF for a point source) over the NW 
is insignificant with respect to the integrated background, then one can estimate the background 
using the formula 

Bij = ^^\<NWi.D>,j\, (11) 

where Airaxay/ exp(l) is the integrated volume of NW{ax,ay) (see Appendix A. 3 for derivation). 
If on the other hand the exposure is not constant, the exposure map^^ can be used as a weighting 
function: 

= E.,<^^"^>-^- (12) 
''^ < NWi<E >,: ■ ^ ' 



\3 



E^, ^^ . (13) 



where -Bnorm,i,j is the normalized (i.e. fiat-fielded) number of expected background counts at pixel 
(i,j). We ignore the distinction between vignetted and non-vignetted components of the back- 
ground (e.g., the particle background) in our estimate, because of degeneracy. We note that if the 
amplitude of the non-vignetted background component B^^ is known, then one could in principle 
estimate the background using a variation of eq. (13): 

_ < NW^{D - i?NV) > . . 

< NW-kE >ij 



There are three situations in which counts from sources may bias the local background estimate: 
(1) if the estimate is being made within diffuse extended emission; (2) if the pixel in which the 
estimate is being made is a source pixel, but ax and/or ay is smaller than the source size s; or (3) 
if sources are located within the NW. 



These maps are later combined into a single map used in the calculation of source properties. See §3.2.1. 
If one does not provide an exposure map, a flat one is assumed, to account for the edge of the FOV. 
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The first situation is a non-issue, because if the analysis goal is detection, say of a source within 
a supernova remnant, then the diffuse emission should be treated as a local background component: 
for instance, should a clump of counts be associated with a source, or with Poisson fluctuations in 
the background and the diffuse emission? 

In the second situation, the background map will exhibit a "bump" at the location of the 
source, whose amplitude is greatest where the number of source counts within the support of 
the NW is maximized, generally at the source centroid (Figure 4). Within the bump, the source 
detection threshold is overestimated, and thus this perhaps-otherwise-detectable source may remain 
undetected at the given scale. However, this is not a critical problem, since if the source is detectable, 
it will be detected when {ax,cry) ^ s, a scale regime where the bump is minimized, and the issue 
becomes moot. Note that these bumps do not adversely affect source property estimation, since they 
are eliminated when the background map that is used for source property estimates is computed 
(§3.2.1). 

The third situation can be more problematic for source detection. If there arc sources in 
the FOV, then there will always be some image pixels for which the background amplitude is 
overestimated: for instance, for a symmetric wavelet function, these pixels will surround sources in 
circular "rings" of radius Ri 2a (the radius at which NW achieves its minimum value; see Figure 5). 
In these rings, Co is overestimated, so that otherwise-detectable sources whose locations coincide 
with these rings may go undetected. Rings can appear regardless of the scale or source size, and 
thus can impede source detection at all scales. They can also adversely affect the computation of 
source properties, as a background that is overestimated in the vicinity of a source can lead to 
underestimated count rates, etc., in the final source list. (This last point is demonstrated below in 
Figure 16, which shows the effect of rings on the estimation of Pleiades Cluster source properties. 
See §4.2.) 

One way to remove the rings is to remove source counts iteratively from the raw data, via the 
following algorithm: 

1. identify pixels to be "cleansed" using p{C\B) and the initial background map, which we will 
dub Bi; 

2. mask out these pixels or replace their data with other values, creating a new image we denote 

3. estimate Bn{Dn), where n is the iteration number (n > 2); and 

4. if the background map is to be refined yet again, determine C„(-D„), identify pixels to be 
cleansed using p{C\Bn), and return to step (2). 

Then, when the final background map i^gnai is determined, compare the original values C with 
p{C\Bfi^si) to create a final list of putative source pixels. 

Regarding steps (1) and (4), since the goal of this iterative approach is to remove as many 
source counts as possible from the raw data, we advocate an aggressive approach to identifying the 
pixels to be cleansed: the threshold significance should be set high, e.g. to So = 10^^ (although never 
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higher than 0.05, which corresponds to the oft-used 95% rejection level of statistics). Regarding 
step (2), while masking is used by Damiani et al. (1997a) in their two-iteration approach to source 
detection, it would preclude us from using FFTs to calculate C{Dn). Thus we replace the data in 
cleansed pixels with the inferred background amplitude. 

There are no rigorous quantitative rules governing how one should specify the number of it- 
erations, as that can depend on the crowdedness of the field, the source distribution, the source 
strengths, and the wavelet scale size.-'^'^ We do note that iterative cleansing will cease if the back- 
ground map does not change from one iteration to the next, i.e., if no new pixels are marked for 
cleansing. 



3.1.4- Treating the Effect of Exposure Variations on Source Detection 

In §3.1.2, we describe how we use probability sampling distributions p{C\Bij) to identify 
sources. These distributions are derived assuming a spatially constant exposure. If, for instance, 
the exposure map exhibits localized high-order variations, then the list of detected sources may 
contain a mixture of astrophysically interesting sources and "instrumental sources" aligned near 
support rib shadows, near the edge of the FOV, at the location of hot pixels, etc. (See, e.g., 
Figure 1 of Damiani et al. 1997b.) Thus, an efficacious source detection algorithm should include 
additional calculations that act to decrease the detectability of instrumental sources, while leaving 
the detection efficiency of astrophysical sources unchanged. ^^'^^ 

One possibility is to construct new sampling distributions p{C\Bij,E) for each observation, 

taking into account all the exposure variations that can appear within the wavelet support; however, 
this is not computationally practical. Instead, we estimate the systematic effect that exposure 
variations have on the correlation coefficients. Assuming the null hypothesis, we may write the 
observed correlation coefficient as 

Cij = < W*D >ij = < W*B >ij + ACij , (14) 

where Sjj is the estimated (noise-free) background intensity and AC^ j is the noise (and possibly 
source count) contribution to C^j. We ignore the latter term (see the caveats below) and rewrite 

^''For a typical, uncrowdcd Chandra field, two iterations (i.e. one round of source count cleansing) arc usually 
sufficient, because the high resolution of Chandra reduces source crowding relative to that observed in, e.g., ROSAT 
data. However, one should always verify that this is the case with one's specific image! 

^®The distributions p{C\Bi^j) are also derived assuming a spatially constant background map, from which simulated 

(lata arc sampled. Thus an efficacious source detection algorithm should also include calculations that mitigate the 
effect of background variations caused by, e.g., X-ray shadows. The current algorithm does not take such variations 
into account; anecdotal evidence (e.g. in §4.2) indicates that they have little efTect, possibly because they are far less 
"sharp" than variations induced by support rib shadows, etc. Note that if the background is known a priori, one 
can in principle remove the effect of high-order background variations on correlation values using a transformation 
similar to the one described below. 

^^Note that exposure corrections are not mandatory-for instance, the user may choose to have no corrections made 
if the analysis goal is scale-by-scale characterization of sources in correlation space. See the caveats below. 
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the former so that its dependence on exposure variations is exphcit: 



< W*B >ij = < W*EBnorm >i,j 



= X! XI Wi-i',j-j'Ei',j'BnoTm,i',j' 
V j' 

— ^ ^ ^ ^ (-^jJ-^norm,i'J' -^i'J')-^norm,i'J') 

= Eij < W*Bnorm >i,j - < W*{SEBnorm) >i,j ■ (15) 

The quantity SEij-i'ji encapsulates exposure variabiHty within the wavelet support, and thus the 
last term in eq. (15) encapsulates the effect of exposure variability upon Cjj; subtracting this term 
from Cij yields an "exposure-corrected" quantity that contains only information of astrophysical 
value: 

Ccoi,i,j = ^i,3~ ^ W^*(<^-S-^norm) >ij 

= Qj- < VF*SSnorm >ij +Ei,j < W*B^^,^ >ij . (16) 

It is this quantity that is compared with the distribution p{C\Bij) to determine whether (i, j) is a 
source pixel. 

If -Bnorm is Constant (or linear) within the wavelet support, then eq. (16) reduces to 

CcT^r = Cij - < W*E . (17) 

Because < W*E > is computed only once, we dub this the "fast" exposure correction, as opposed 
to the "full" exposure correction of eq. (16). One should not use the "fast" correction if non-linear 
structures (caused, e.g., by X-ray shadows) are apparent in the background map. 

One should keep the following caveats in mind: 

1. Strictly speaking, Ccor,i,j still cannot be directly compared with the probability sampling 
distribution p{C\Bij) because the noise term ACjj is itself uncorrected. If wc concentrate on 
the issue of false positives (i.e. assume that there is no source count contribution to ACij), 
the important question is: is the asymptotic width of the distribution from which ACcor,ij is 
sampled smaller than the width of the distribution from which ACij is sampled? If so, then 
the rate of false detections will still be greater than expected. This is a problem if and only 
if for a given pixel, Eij is smaller than the average exposure £^ave over the wavelet support 
(i,j), i.e. this is only a problem within troughs or beyond the edge of the FOV. To see this, 
harken back to the simple example of §2.2: what would happen to the width of p{C\Bi_j) if 
we were to reduce the exposure? Fewer counts would be detected, so Bij would decrease, and 
the width of the noise distribution, which is oc \fBj^j^ would also decrease. Thus we suggest 
that one should carefully scrutinize all sources detected in low-exposure regions (^ QflE^^^. 

2. Note the distinction between the correlation maps C and Ccor, especially if the analysis goal is 
not just source detection, but also image decomposition (the scale-by-scale characterization of 
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sources in correlation space). ^ The quantity Ccor represents a pixel-by-pixel wavelet filtering 

not of the raw data D, but of the quantity D — £^i?noiTii + ^ij-Bnorm (cq. 16). Because i?norm 
may be estimated using the NW, which is most sensitive to low-frequency components of 
the data (see Figure 6), these modified data may be "contaminated" with low- frequency 
information (although wavelet filtering [eq. 16] mitigates the effect of the contamination). 

3. We note that while systematic overestimates of -Bnorm (caused for reasons discussed in §3.1.3) 
adversely affect the computation of Ccon they will not lead to an increased number of false 
detections. This is because the only situation where such a systematic overestimate affects 
non-source pixels in the final, refined background image is when the background is computed 
in regions of extended diffuse emission. In this situation, the algorithm treats the sum of the 
real background and the diffuse emission as the "background," so the rate of false detections 
(which is independent of background amplitude) will be unchanged. 



3.1.5. Variance Estimation 

We estimate the variances of Bij (if one does not provide a background map) , and of Cij (or 
C cor, i,j or C^or^'i^j^), using the standard formula (Eadie et al. 1971, p. 23) 

V[Y] = V[Y,Y.'^i,jXi,j] 

i j 

= EE ^j^^ij] + 2 E E E E ai,,a,.,,vcov[X,,„ (18) 

i j i i'>i j j'>j 

where Y is quantity of interest and Xij are random variables (either Dij or functions of Aj)- For 
instance, 

i' j' 

i' f 

= EE^'-'j-zA',/ (19) 

i' j' 

= < W^*D >ij . 

Note that we make two assumptions when deriving this formula: (1) the datum Di'ji is sampled 
from a Poisson distribution with variance Di'jr; and (2) each pixel's raw datum is independently 
sampled (so covariance terms do not contribute to ^[Cjj]). 

We list variance formulae related to source detection in Table 1. The reader should keep in 
mind two important caveats about them: 



Note that source characterization in WAVDETECT is done using the raw data themselves and not using correlation 
coefficients. Aside from source detection, the only other place where the exposure-corrected correlation map is used 
in WAVDETECT is in the creation of the noise-free, exposure-corrected image of detected sources (§3.2.5). Thus this 
caveat is only an issue if the user desires to analyze correlation maps outside of WAVDETECT. 
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1. These formulae ignore the contribution of the covariance terms, which are non-zero for T^[Ccor] , 
V[CcoF^°^], and V[B] if the data are iterativcly cleansed, i.e. if B is a not just a function of 
the raw data D only. We ignore these terms because even the simplest covariance compu- 
tation, that for a two-iteration background map (see Appendix C), has a staggeringly high 
computational cost: we find that the CPU time needed to compute the variance increases by 
a factor ~ O^dxdya^a^), where dx and dy are the x- and y-axis lengths in pixels, respectively. 
Also, additional arrays containing information needed to compute the covariance terms must 
be kept in memory, so there is a resource cost as well. We find that including covariance 
terms increases the variance by a median value of ~ 7%, and at most by only ~ 30% adjacent 
to strong sources, although this is a source-strength- and source-geometry-dependent result 
that obviously cannot be blindly applied to all fields. Ultimately it is up to the user to judge 
whether adding the computation of covariance to our base algorithm is worthwhile. 

2. When computing for the final background map, we make the simplifying assumption 
that the variance of a cleansed datum is equal to the cleansed datum itself; for instance, for 
a two-iteration background map, 



i' j' 

= Y^'^,^,3,'D2,','^ (20) 



where 



= < ^^^^ >„ • 

We make this assumption because if the data are a mixture of raw data and background 
estimates, the variance estimates become increasingly complicated: for one iteration. 



V[Dij] = D. 



for two iterations. 



Dij uncleansed pixel 

V[D2,i,j] = \j2i'J:j'<^,i'j,j'Di',j' cleansed pixel (^2) 



etc. 



3.2. Source Characterization 

Once we have identified putative source pixels at each of a number of wavelet scale size pairs 
(axjCFy), our next objective is source characterization, wherein we combine information derived at 
each scale pair to generate a final source list and to estimate source properties (see Figure 7). 
Unlike source detection, source characterization algorithms can be arbitrarily complex, depending, 
for instance, upon whether one wishes to use detailed PSF information. Our method is particularly 
simple, in that we use only the characteristic PSF size at a given pixel, rpsF,i,j- This size may 
be associated with, e.g., 50% encircled energy; we find that smaller values, such as 39.3% (which 
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corresponds to the integral of a symmetric normahzed two-dimensional Gaussian to radius era), 
work better than large values. While using the detailed PSF shape may allow for more accurate 
estimates of source properties, the simplicity of our scheme makes it more immediately applicable 
to images from virtually any counts detector (including those for which calibration is on-going or 
is for other reasons incomplete). 



3.2.1. Corrected Background Estimate 

In §3.1.3, we describe how we calculate a scale-dependent normalized local background estimate 
-Bnorm, by assuming that there are no source counts in the negative annulus, NW. Source detection 
itself is not markedly affected if this assumption is violated and the background overestimated, 
for reasons given in §3.1.3, but an overestimated background will adversely affect the computation 
of source properties. We create a new, corrected, background estimate by combining information 
across scales (denoted with a subscript fc), noting that the assumption that there no source counts 
in NW is always violated around sources if either a^^k or ay^j- is less than the source size: 

r>l _ Z^fc=l ^i,j,kO'x,kO'y,k't>norm,i,j,k /no\ 

-°norm,i,j — „jv • K'^'J) 

Z^k=l ^i,j,kO'x,k(^y,k 

N is the number of scale pairs used, and 

k = I mm{V2aa:,V2ay)>mrpsF,i,j (^24) 
1 otherwise 

m is a multiplicative factor, set to one when estimating the properties of point sources, and to 
larger values when extended sources are analyzed. We use eq. (18) to estimate the variance of 
-^norm,j,j> with covariance terms ignored: 




\ 2 

(^i,j,k^x,k^y,k \ 
Z^fc=l ^i,j,k(^x,k<^y,k / 



V[Bnorm,i,j,k] ■ (25) 



y[BnoTm,i,j,k\ is estimated using the approximate equation listed in Table 1. 



3.2.2. Source Cells 

The next step in source characterization is to determine which pixels of the original image D 
are to be associated with each detected source. We term contiguous pixels which are associated 
with a particular source a source cell, and as we describe below in §3.2.4, we use the raw data in 
a source cell to determine a source's properties. We need to create source cells because, as noted 
above, we do not use PSF shape information in our algorithm, and because we want an algorithm 
that is applicable to both point and extended sources. Source cells are not computed in algorithms 
such as the sliding cell, where the integrated PSF volume and sum of (point) source counts for a 
given user-defined cell can be used to estimate the total number of (point) source counts, etc. 



To create source cells, we first compute source count images, smoothing the raw data with 
the positive kernel of the wavelet function, PW, at user-specified scales, and then subtracting the 
corrected background image, B': 



(We denote these images SC to avoid confusion with cither the significance S or the correlation 
image C.) We use PW as the smoothing function because it has the desirable properties of being 
localized, and, for the particular case of a symmetric MH function, of mimicking the shape of a 
canonical Gaussian PSF. In regions where there are no sources, source count image values are either 
zero, or positive and nearly zero; only in the vicinity of sources do the values deviate markedly 
from zero. Thus the source count images appear to contain numerous "islands" of non-zero flux 
in a sea of zero values, with their relative size increasing with size of the smoothing function PW 
(see Figure 8). Each island contains one or more peaks, and sub- islands may be defined using 
each peak, with saddle points providing the boundaries between them. (Sub-)islands observed in 
selected source count images define the source cells. 

To define a source cell, we must select a source counts image and then determine to which 
(sub-) island the putative source belongs. The selection proceeds as follows. The location of a 
putative source in correlation-space is assumed to be the location of the correlation maximum, 
{ic,jc)- At this location, we compute the PSF size, in pixels, rpsF,ic,jc- (This introduces a bias 
towards point sources; we return to this point below.) We then select the source count image with 
smoothing scale "closest" to rpsF,ic,jc minimizing |log20"fc — log2T'psF,icjcl' where ak is defined 
for each scale pair: 



On the selected source count image, we examine pixel {ic,jc)- the (sub-)island to which this pixel 
belongs defines the source cell. 

A source cell defined in this manner has advantageous properties: (1) if ak ~ rpsF,ic,jc' then 
nearly all isolated point source counts should lie within a source cell (Figure 9); (2) exposure 
variations are taken into account via the use of < PW-kE > in eq. (26), so that source cells are 
not truncated near, e.g., support rib shadows; and (3), as noted above, saddle points in the source 
count images provide natural boundaries between sources in crowded fields (Figure 10). 

We note two situations where care must be exercised when interpreting results. First, the 
source cell for an extended source may be too small if the smoothing scale is ~ ^^psf, tcJc- '^^^ 
steps one must take to deal with this situation will vary depending upon analysis circumstances, 
but one possible step is to create only one source count image, with ak ^ '^PSFjicjc ^^'^ to use 
this image to define the cell, while being careful to note whether previously detected point sources 
are located within it. (See, e.g., §4.1.) Another situation for which care must be exercised is when 
the PSF is bimodal or otherwise strangely behaved (such as the off-axis Chandra PSF); two or 
more source cells could be created for one detected source. The necessary steps to deal with this 
situation depend upon the details of the detector itself, and thus we will not discuss this particular 
situation further here. 




(26) 



exp 



log(q"a;) + log('7^) 

2 
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3.2.3. Source Rejection 

Because the same source will generally be detected at multiple scales, and to further decrease 
the possibility of finding false sources, it is necessary to reject sources from the lists of correlation 
maxima generated at each scale. A maximum observed at {ic,jc), for the scale pair {(yx,k^(^y,k)^ is 
rejected from fTirthcr consideration if any of the following conditions are met: (1) iicijc) lies in a 
previously defined source cell; (2) SCi^j^^k = 0; (3) the ellipse defined by 

{■r ic? , {y-3cf _ , 



contains one or more previously defined sources detected at smaller scales (this can occur when ax,k 
or ay^k is ^ ^psf, since previously identified sources will eventually merge if the field is crowded, 
creating "new" sources at new locations); and (4) if, after the source cell is defined for a particular 
scale, it is found not to contain any correlation maxima at that scale. This last check is aimed at 
rejecting small-scale Poisson fluctuations that may be observed in the background data. 



3.2.4- Source Properties 

We present the formulae we use to estimate source properties and their variances in Tables 2 
and 3 respectively. The summations performed when making these estimates are carried out over 
the pixels within the source cell. We use the raw counts data, Di j , as a weighting function, instead 
of the source fiuence Dij — EijB'^^j.^ j^ ,j, because the use of the latter can greatly complicate the 
estimation of variances. Using the data rather than the source fiuence will lead to similar estimates 
when the background amplitude is small relative to source amplitude. 



3.2.5. Noise-Free Source Image 

We can use the information present in the correlation images and the source cell image to create 
a "noise-free" rendering of the observed source data, SD, with the effect of exposure variations 
removed: 

Dividing the correlation value by a^^k and ay^k restores the normalization contained in eq. (5), 
allowing a scale-by-scale summation. The quantity i^ij^k = 1 if (1) Ccoi,i,j,k > 0, (2) the local 
maximum corresponding to {i,j) has been identified as a source pixel, and (3) the associated local 
correlation maximum is contained within a source cell (the second condition ensures that random 
maxima which are not associated with a source but which happen to lie within a source cell are 
not included in the source image; the last condition ensures that rejected sources are not included); 
otherwise, = 0. 
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4. Verification 

To verify its source detection and characterization capabilities, we apply our algorithm to: (1) 
1 and 10 ksec observations by an idealized detector with a spatially invariant PSF; (2) a 32-ksec 
ROSAT PSPC observation of the Pleiades Cluster; and (3) a simulated 30-ksec Chandra ACIS-I 
observation of the Lockman Hole region. These tests allow us to demonstrate that our algorithm 
can efficiently detect and accurately describe wcU-samplcd sources in an uncrowdcd field, and can 
effectively analyze crowded fields, even in the low-background limit of the Chandra detectors. 

4.1. Idealized Detector with Spatially Invariant PSF 

We first demonstrate that our algorithm can efficiently detect, and accurately describe, well- 
sampled sources in an uncrowded field. We apply it to two 512x512 images, hereafter Images 
A and B, that represent 1 and 10 ksec observations by an idealized detector with an effective 
area 1000 cm^ and a spatially invariant Gaussian PSF of width crpsF = 2.56 pixels (Figure 11). 
The exposure map for this detector is similar to that of the Einstein IPC. Within each image 
we randomly place 42 point sources, and 4 extended sources with elliptical shape. The fluxes of 
the point sources were sampled from a \ogN — logS" distribution with slope —1.5, above 10""*^^ erg 
cm^^ sec^^. We also simulate a locally variable background (amplitude ~ 10^^ ct sec^^ pix^^) by 
setting the background amplitude at five reference points, performing minimum-curvature-surface 
interpolation, and sampling background data in each pixel. 

We analyze the images assuming the input parameters listed for Test 1 in Table 4. In Figure 12, 
we plot the source counts for detected sources, and the upper limits for undetected sources, against 
the number of predicted counts. Upper limits are defined using the source detection threshold values 
at the correlation maxima nearest the location of the undetected sources, and are computed using 
eq. (6), with cjx = CTy = \/?>(Jq = 4.43 pixels. We conclude that our algorithm efficiently detects 
and describes point sources with ^ 10 counts. This is not an absolute quantity: the minimum 
number of counts needed for source detection varies as a function of the background amplitude and 
source size (see §2.2). Hence while we may conclude that a 1 ksec observation is sufficient for the 
detection by Chandra of nearly on-axis point sources with fluxes ^ 10^^'' erg cm~^ sec~^, since 
it has a similar effective area as, and lower expected background count-rates than, our idealized 
detector, it may not be sufficient far off-axis, or for detectors that have higher rates of background 
accumulation. 

In Figure 13, we show the cells for the detected sources, along with the input source locations. 
We use a wavelet scale of 2\/2 pixels to create the source counts image that is in turn used to 
delineate the source cells. This scale is the closest "standard" scale to the assumed PSF size (if we 
assume scale sizes separated by factors of \/2 rather than 2 for greater source detection efficiency). 
In both images, our algorithm detects one false source, which is consistent with the assumption of 
So = 10-6 for a 512x512 image. 

Creating source cells by using information derived at a wavelet scale close to the PSF size is 
not optimal when one wishes to analyze and describe extended sources, as discussed in §3.2.2, and 
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indeed by examining Figure 13 we can see that the extended source cells are undersized. There 

are many ways by which an analyst may wish to treat extended sources; here, we show how one 
could derive an image showing the (normalized) number of counts per pixel within the extended 
source. We use as our example the largest extended object in Image B. First, we must expand the 
source cell so that it just encloses the extended source. To do this, we increase the minimum scale 
size at which a source counts image is to be computed (in this example, from 2\/2 pixels to 8\/2 
pixels; see Figure 14a-b). (Note that the parameter m in cq. 24 must also be increased so that the 
background is not overestimated within the source.) Second, we would use the new source cell as a 
spatial filter, applying it to the original source counts image created at the PSF scale (e.g. Figure 
14c), or to the data, etc. 

(If one outputs source count image data, one can then, in principle, fit directly to them. For 
instance, the image may be of a galaxy cluster, and one may wish to assess the detectability of its 
constituent galaxies. However, care must be exercised since the data in contiguous pixels are not 
independent. Taking into account the width of the PW used to smooth the raw image data, we can 
state that data > ^/2a pixels apart are independent. Thus simple statistical fitting can be done to 
a sparse grid of data. This process of fitting would be essentially equivalent to the "decimation" 
method described by Lazzati et al. 1998, except that they fit to correlation image data.) 

4.2. ROSAT PSPC: The Pleiades Cluster 

Next, we demonstrate that our algorithm outperforms the sliding cell in efficiently detecting 
sources in a crowded field, by applying it to the deep (32 ksec) ROSAT PSPC observation of the 
core of the Pleiades Cluster (RP200068, cf. Micela et al. 1996). These data were obtained in two 
segments separated by roughly six months, and the slight boresight offset between the two segments 
has been corrected using a method described by Micela et al. The data were also filtered to exclude 
times of high background contamination, and to exclude pulse-height-invariant (PI) channels at 
both the low-energy (PI<20, to avoid the so-called "ghost image" problem; Nousek & Lesser 1993) 
and high-energy (PI>201, where no instrument map is available to determine exposure variations) 
ends of the spectral response. We have computed exposure maps taking into account these changes 
using software developed by Snowden &; Kuntz (1998). 

In Table 4, we show how the number of detected sources varies as a function of the number of 
iterations, the spacing of scales, the exposure correction method, and the source detection signifi- 
cance. Comparing Tests 1 through 6, for which So is constant, we find that the number of detected 
sources changes little if the number of iterations is increased beyond two, or if the full exposure 
correction method (eq. 16) is used instead of the fast one (eq. 17). However, there is an ~ 5% 
increase in source yield by analyzing the image with the scale sizes spaced by factors of instead 
of 2. The extra sources are relatively weak sources whose probability of detection is maximized 
around the scales \/2 pixels, 2\/2 pixels, etc. It is the decision of the user as to whether the increase 
in weak-source detection efficiency is worth nearly doubling the computation time. 

We compare the source detection results for our Test 8 with those shown in the WGACAT and 
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ROSATSRC catalogs (White, Giommi, & Angelini 1994, and Voges et al. 1994, respectively),^^ as 
well as those shown in Micela et al. and Damiani et al. (1997b). (The WGACAT and ROSATSRC 
catalog teams, and Micela et al., use variants of the sliding-cell algorithm.) We choose Test 8 
because it most closely resembles the analysis of Damiani et al., who perform a two-iteration 
analysis of the Pleiades image with wavelet functions of scale sizes 1, ^/2, 16 pixels, and with 
threshold significance So = 1.33 x 10~^ (4.2(t). Our result is nearly equal to that of Damiani et al., 
as we detect 148 sources, while they detect 150. We cannot directly compare our numerical results 
with those of Damiani et al., who only publish figures showing the performance of their algorithm 
on a subset of the Pleiades field. However, we can emulate Damiani et al. by showing how our 
results compare with those of the WGACAT and ROSATSRC catalogs (Table 5, cf. Damiani et 
al. Table 1). Our results are virtually identical to those of Damiani et al., in, e.g., how many sources 
are detected only by our algorithm, etc. We thus may conclude that our algorithm and that of 
Damiani et al. generate a largely similar source list. 

A large fraction of the sources that we detect, but that are not included in the WGACAT 
or ROSATSRC catalogs, are located either near the inner telescope support ring (« 20 off-axis) 
or near the precipitous drop-offs in exposure caused by telescope vignetting near the edge of the 
FOV (Figure 15; see also Figure 16). A visual examination of these sources indicates that they are 
not spurious. We further examine in detail the two sources near the edge of the FOV that were 
not included in either the WGACAT or ROSATSRC catalogs, but are in regions covered by other 
ROSAT FSPC pointings described by Stauffer et al. (1994). We find that these sources lie within 
rpsF pixels of the Stauffer et al. sources 159 and 197 (see their Table 2), which have reported count 
rates ~ 0.023 and 0.016 ct sec~^ respectively. These rates are consistent with our derived count 
rates. This demonstrates the robustness of our simple exposure correction method. 

We also note the intriguing result that the local background map computed by our algorithm 
indicates the presence of an X-ray shadow in the core of the cluster (Figure 17; Kashyap et al. 2001, 
in preparation). Because the Pleiades cluster is located beyond the edge of the local bubble of 
hot gas (Prisch 1995), this shadow, which is most pronounced at low energies, is of more distant 
sources of the diffuse X-ray background (DXBG) , such as the extragalactic component and obscured 
stars in the Pleiades cluster itself. The depth of the shadow places constraints on the nature 
of the stellar mass-function at low masses, and in particular rules out models where the mass- 
function is extrapolated at a constant slope from higher masses, thus providing independent. X-ray 
observational support for optical observations that report drops in the mass-function at low masses 
{Mb ~ 10; see, e.g., Tinney, Mould, & Reid 1992, Bahcall et al. 1994). 

4.3. Chandra ACIS-I: The Lockman Hole 

Finally, we demonstrate that our algorithm can handle the low background amplitudes which 
are characteristic of Chandra observations, while continuing to outperform the sliding cell, by 
applying it to a simulated 30 ksec Chandra ACIS-I image of the Lockman Hole (T. Gaetz, private 



Available from HEASARC: http://heasarc.gsfc.nasa.gov/. 
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communication; Figure 18). The ACIS-I is comprised of four 1024x1024 pixel CCDs configured in 

a 2x2 square, with FOV ^ 17x17 (ss 50 times smaller than that of the ROSAT PSPC). Within 
the ACIS-I field are placed: (1) 12 optically identified ROSAT PSPC X-ray sources catalogued by 
Schmidt et al. (1998), including one extended cluster source;^^ (2) fs 6000 point sources sampled 
from the Hasinger et al. (1998) logiV — logS distribution between 10"-*^^ and 5xl0~^^ erg cm~^ 
sec~^; and (3) 19500 particle background counts (corresponding to a particle background rate of 
1.5x10"''' ct pix~^ sec~^). 

If we assume the input parameters listed for Test 1 in Table 4, and do not use an exposure map, 
we detect 171 sources in the full ACIS-I field, of which four are directly associated with the extended 
galactic cluster (Figure 18). In Figure 19, we show differential logA'^ — logS" distributions for both 
detected, and all, points sources in the FOV. On the basis of this figure, we may conclude that 
our algorithm will efficiently detect sources in ACIS-I images with fluxes ^ 10~^^ erg cm~^ sec~^ 
(0.5 - 2 keV), and has the ability to detect Poisson sampling fluctuations for sources with fluxes ^ 
10~^^ erg cm~^ sec~^. Our result compares very favorably with the performance of CELLDETECT, 
which detects 51 sources with > 3. (Further comparison of the source detection efficiencies of 
WAVDETECT and CELLDETECT is provided by Kim et al. 2001, in preparation.) We conclude that the 
relative superiority of our wavelet detection algorithm, with respect to the sliding cell, is inversely 
proportional to the background amplitude. 

In Figure 20, we plot the offsets of the locations of detected sources from their actual locations 
within the FOV. (These offsets are caused by the asymmetry of the Chandra PSF, and their 
magnitude increases with off-axis angle.) We note two characteristics of these offsets. First, the 
variation in the offsets does not depend on source strength, signifying that the source-locating 
process is insensitive to the strength of the source. Second, the observed offsets are significantly 
smaller than the expected mean separation between sources (~ 50 pix), implying that the observed 
offsets arc due to the asymmetries inherent in the PSF and not due to source misidentifications. 
We thus find that the weak sources are as "well-behaved" as strong sources (about whose detection 
and identification there can be little doubt), and hence infer that even the weakest detected sources 
are real. 



5. Comparison with Existing Algorithms 

The source detection algorithm which we present in this work resembles algorithms published 
previously by Vikhlinin et al. (1994), Rosati et al. (1995), Grebenev et al. (1995), Damiani et 

al. (1997a), and Lazzati et al. (1998). In this section, we highlight important differences between 
our algorithm and these other algorithms, all of which, having been developed for analyzing ROSAT 
PSPC data, suffer from inherent limitations that do not allow them to be directly applied to data 
from, e.g., Chandra. We do not discuss how our method of source characterization differs from 
those described previously, because these other methods are built upon the premise that the PSF 



Other detected X-ray sources that lie within the ACIS-I FOV, but were not optically identified, have not been 
included. 
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has Gaussian shape. Thus they are simply not directly applicable in situations where the PSF has 
a more complex shape. 

5.1. Correlation Image 

Our basic method for computing the correlation images is the same as that used by Vikhlinin 
et al., Rosati et al., Grebenev et al., and Lazzati et al., with the exception that Rosati et al. use 
the symmetric Morlet wavelet function 

(28) 

instead of the MH function. However, these authors do not attempt to correct for exposure vari- 
ations, as they focus their attention upon the center of the ROSAT PSPC FOV, and they use a 
fundamentally different method to determine source detection thresholds (see below). 

The method by which Damiani et al. correct for exposure and detect sources differs substan- 
tially from ours. They first divide the raw data image (which they refer to as the "photon image") 
by an exposure map on a pixel-by-pixel basis, to create the so-called "count-rate image." Pixels 
with relative exposure less than a certain amount (e.g. 0.2) are not included, which introduces a 
sharp edge in the count-rate image where telescope vignetting becomes important. Damiani et 
al. correlate the wavelet function with this image, applying an analytic correction to correlation 
values near the edge (as given in their eq. 12). Because the data in the count-rate image are not 
Poisson-distributed, Damiani et al. must convert source detection threshold values, derived from 
their background map in the photon image, to values appropriate for the count-rate image. They 
accomplish this by dividing the photon-image detection thresholds by an effective exposure time 
feflf, which is not a source exposure time that can be used to convert estimated source counts to 
count rates. Because Damiani ct al. derive the equation with which they compute effective exposure 
time in the Gaussian limit (sec their Appendix B), their exposure correction method is effectively 
limited to the high-counts regime: it cannot be applied as is to, e.g., typical Chandra data. 

5.2. Background and Source Detection 

As described in §3, we estimate the local background counts amplitude in each pixel by assum- 
ing that within the negative annulus of the wavelet, NW , there are no source counts (cq. 12). We 
then use that inferred amplitude to determine source detection thresholds. This is a generalization 
of the approach used by Vikhlinin et al., Rosati et al., Grebenev et al., and Lazzati ct al., in which 
the correlation variance < W^-kD > is used to determine thresholds (e.g., — $^=^ > 3.5). This 
approach will work if the background is locally flat or has a locally constant gradient, in which case 
the correlation of wavelet and background is zero. Such an approach is obviously insufficient for use 
with the whole FOV of X-ray detector, where support rib shadows and vignetting will affect the 
background, and/or in situations where the non-instrumental background varies markedly (such as 
in the Pleiades; see Figure 17). 
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Damiani et al. use the same basic approach to source detection as we do, in that they compute 
a local background amplitude, and use it to compute source detection thresholds. Damiani et 
al. compute the background map by first smoothing the raw data with a Gaussian with width 
> 2c7psF,i,j- They then interpolate background estimates within support rib shadows. At each 
scale, they compute the median value of the smoothed background within a square region with 
side-length I = 4((T^ -)-crpgp ; j) centered on the given pixel. (We explicitly use CTpgp instead of rpgp 
to highlight their assumption of Gaussian PSF shape.) After point sources are detected (using a 
source cleansing significance that is 0.2(T smaller than their source detection significance), regions 
around them containing 95% of the counts are masked out, and a refined background estimate is 
made by interpolating over the masked regions. 



6. Summary 

In this work, we present a generalized wavelet-based source detection algorithm that, in prin- 
ciple, can be applied immediately to image data collected by any photon counts detector, although 
it was developed specifically for the analysis of Chandra X-ray Observatory image data. We ex- 
clusively use the Marr, or Mexican Hat, wavelet function in this paper, but the basic details of 
our algorithm would be unchanged if we use other wavelet functions, such as the Haar or Morlet 
wavelet functions. Aspects of our algorithm include: (1) the computation of the correlation of the 
wavelet function and the data image using cither analytic or FFT methods; (2) the computation of 
a local, exposure-corrected normalized (i.e. fiat-fielded) background in each pixel; (3) its applicabil- 
ity within the low-counts regime, as it does not require a minimum number of background counts 
per pixel for the accurate computation of source detection thresholds; (4) the correction of those 
correlation values which are affected by large exposure variations within the wavelet support (due 
to, e.g., telescope support ribs or the edge of the field of view), using either one of two methods 
given by eq. 16 and eq. 17; (5) the generation of a source list in a manner that does not depend 
upon the details of the PSF shape, including the creation of general, data-dependent source cells 
for the estimation of source properties and filtering of extended source data; and (6) full error anal- 
ysis. In these respects, our algorithm is considerably more general than the similar wavelet-based 
methods developed by Vikhlinin et al. (1994), Rosati et al. (1995), Grebcnev et al. (1995), Damiani 
et al. (1997a,b), Starck & Pierre (1998), and Lazzati et al. (1998). Nearly all of these methods were 
developed specifically for treating data collected by the ROSAT PSPC (Starck Sz Pierre 1998 apply 
their method to data from the ROSAT HRI); none except Damiani et al. attempt to correct for 
variations in exposure within the FOV; and all except for Damiani et al. assume a flat background 
across the region of interest. The relatively more general method of Damiani et al. is limited to 
analyzing data from detectors which have PSFs with Gaussian shape, and which have high rates 
of background accumulation. These limitations make the Damiani et al. approach, as published, 
inappropriate for use with, e.g., Chandra data. 

In §4, we demonstrate the robustness of our algorithm by applying it, without algorithmic 
changes, to data collected by an idealized detector with a spatially invariant Gaussian PSF; to 
ROSAT PSPC data of the crowded field of the Pleiades Cluster; and to a simulated Chandra ACIS- 
I image of the Lockman Hole region. Collectively, these test cases indicate that our algorithm: (1) 



effectively detects and describes point sources, and can be applied to the study of extended sources; 
(2) does not detect more spurious sources than expected; (3) is more sensitive than sliding-cell 
methods; and (4) has equal sensitivity to the Damiani et al. method for tlic specific case of ROSAT 
PSPC data. We find that while we can use the algorithm as presented to analyze extended sources, 
such analysis requires careful monitoring on the part of the analyst. Work on generalizing this 
analysis is on-going, and will be reported in a future work. 
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A. Derivation of Formulae Associated with the Mexican Hat Function 

The source detection algorithm that wc have presented docs not explicitly depend on the 
details of wavelet function itself, and thus should be applicable with other, simple, wavelet functions 
which have one central positive mode. Hence we have deferred to this Appendix and the next the 
derivation of various formulae that we use in our algorithm which are valid specifically when using 
the Marr, or Mexican Hat (MH), wavelet function. 



Before carrying out source detection at given scales {(Tx^cTy)^ we must determine the grid of 
values Wi^i'j^j' (see 7). This is accomplished by integrating the function Wm{-^, defined in 
eqs. (4-5), on a pixel-by-pixel basis within an ellipse with axes with full lengths Wax and lOcXy, 
with values farther from the origin set to zero: 



A.l. Pixel- by-Pixel Integration of the MH Function 




) 



(Al) 



(A2) 



{x,y) represent coordinates relative to the center of pixel and and (xi,X2) and (2/1,2/2) denote 

the limits of integration over pixel (Figure 21): 
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X2 = I -i+- y2=j + 



We expand eq. (A2) and perform each integral separately. The first integral in the expansion 
is computed using 

^2 / ™2 \ /•t2 



dx exp 
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where the substitution t = is made, and erf(x) is the error function. Performing the same 

integration over y, we determine that the first term is 2^(Tj;CryXerfyerf = '^(^x^^yXeviVeri- 



To compute the second and third integrals, we must determine, e.g.. 



rX2 J2 / ^2 \ f-t2 

/ exp ( j = 2\/2(7^ / dtt^ exp{-f). 



(A3) 



This integral can be solved by parts, by taking the derivative of t and the integral of iexp(— t^). 
We present the solution: 

Xl exp j - exp ^"^^ + ^^(^xXeif = XdiS + ^j^cTxXevi- (A4) 



The final solution is 



Wi-i'j-j' = TraxCTyXeTfyeri - ^^(^yVeri (^XdiS + ^^(TxXerf^ - ^^CT^Xeri ^ydiff + ^^(^yVevf^ ■ 

(A5) 



A. 2. Fourier Transform of the MH Function 

Analytic computation of correlation values (eq. 7) may be too computationally intensive if 
the image and/or wavelet scale sizes are too large. So in WAVDETECT, for instance. Fast Fourier 
Transforms^^ (FFTs) are used if scale sizes are > 2 pixels: 

C FFT"^ [AT X FFT(W^) X FFT*(D)] . (A6) 

To mitigate the effect of the "wrap-around," any image that is to be transformed is padded with 
zeros; the minimum width of the padding is 10xmax((73;, cTj,). 



2°The CIAO WAVDETECT code (written in C) makes use of the FFTW package (Frigo & Johnson 1998), while our 
own Fortran code uses the pubUcly available MFFT algorithm (Nobile & Roberto 1986). 
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In our own Fortran code, we use the analytic Fourier Transform of the MH function, which we 
denote FT{W): 



FT{W) 
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The wave- number k equals Hn^^Dhi^ where ik is the pixel number in Fourier space, A;n = ^ is the 
Nyquist wave-number, and / is half the length of the relevant axis in the padded image. The fourth 
integral in eq. (A7) and the second integral in eq. (A8) are zero, as the integrands are products of 
even and odd functions. 
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^o,c(P) Q, A) and "^2,c{(i-,p) represent two integral solutions which one may find, e.g., in Gradshteyn 
& Ryzhik (1980; formulae 3.896-2 and 3.952-4 respectively): 



*o,c(p,g,A) = ^VTrexp ^-^^ cos(pA), 



and 



2p2 - 



exp - 



Substituting these quantities into eq. (A9), one finds that 
Cw = exp (^--^ - "^T^^klalj a^sPi^ 



„2n 

^T^klal + 1 - ^ 



(AlO) 
(All) 

(A12) 



Substituting eq. (A12) into eq. (A8) and solving, we find: 

r+oo / 2 

= exp{-2Tr^klal)axV2Tr J dy cos{2Trkyy) exp y- 



y 



21 



^TT^kiai + 1 - ^ 



-27- 



exp( 27r k^a^)ax\'i'T^ v^" "-x'^x ' -^y ^ u,cv-"'"'2/) i^/ 9 " ^'CV""""?/' /h 
= {2Tr fa^ayiklal + k^al) exp[-2Tr^ {klal + k^a^y)] . (A13) 

By using FT{W) instead of FFT{W), we reduce computation time, but numerical estimates 
are less accurate (with respect to estimates derived analytically). We quantify the discrepancy 
between any two of our three methods {FT{W), FFT{W), or analytic) using 

S ^ ^. (A14) 

J- an 

We use the analytically derived detection threshold Tan in the denominator because the expectation 
value of C is zero. It also provides an intuitive way to describe the discrepancy at the detection 
threshold. We find 5fft. an ^ 10 ^5 while <5FT,an ^ 10 ^? increasing with rp' . (See Figure 22.) 

The discrepancy results from the fact that we analyze binned images. Because the data are 
binned, we should actually compute FT{W) using the equation 

/ + CXD />+0O 
/ dxdyWi^i>j^ye^'''^''^''+''yy\ (A15) 
-co J —oo 

If W{x,y) exhibits significant curvature in a bin, then Wi-iij-j' ^ W{x,y) within that bin; the 
effect of this discrepancy in the final correlation value is proportional to the number of counts in 
the bin. Thus (5FT,An will increase with source strength, as shown in the right panel of Figure 22. 

If we wish to compute the error of the correlation value in a pixel, we calculate < W'^-kD >i,j, 
saving computation time by using the analytic Fourier Transform FT{W^). The derivation of this 
function is similar to the derivation of FT{W). A new integral which appears is: 
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One may solve this integral by parts, differentiating the term ^ cos(27rA:a;x) and integrating the 
term exp ( — ^ ) . The integral 
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has solution (see, e.g., Gradshteyn & Ryzhik 1980, formula 3.952-5): 

"+00 / 37r%-27r3fe^ 

dxx^ sm{2TTkxx) exp \^ -j = Viral — ^^^-^ exp(-7r^(T^fc^). (A18) 



The final solution is 

FT{W^) = [27raxay + n'a,ay{klal + kyy)']eM-^Hklal + kyy)]. (A19) 

Use of FT{W'^) instead of FFT{W^) leads to a less accurate accounting of the errors as derived 
using purely analytic methods, for reasons given above. We find the magnitude of the discrepancy 
in error estimates, as a fraction of the detection threshold, to be similar to that seen above (~ 
10~^); if we use the analytic error in the denominator instead of the detection threshold, we find 
that the discrepancy is constant as a function of ^, at ~ 10~^. 
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A.3. Integration of the MH Function Negative Annulus 

If exposure variations and the FOV edge may be ignored in the computation of the background, 
then we may replace < NW-kE >ij in eq. (12) with a background normahzation factor, Nb, which 
we derive here. 

The average value of the MH function is zero. Hence Nb may be derived by integrating the 
MH function over either PW or NW. We choose the former: 
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The limits of integration reflect that the core extends over an ellipse with axes V^ax and V^ay. We 
reparametrize the integral using polar coordinates {r,9), after first mapping the boundary ellipse 
to a boundary circle using the transformation y' = ^y: 
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The determinant of the Jacobian of the transformation from {x,y') — >■ {r,6) is r. 
We evaluate the second integral using integration by parts: 
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The second integral in eq. (A24) cancels with the first integral in eq. (A23), leaving: 
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B. Computation of Detection Thresholds 

We associate an image pixel (i, j) with a source if the significance Si^j of its correlation value 
Ci^j is greater than a user-defined detection threshold significance So- 

roo 

Sij = / dCpiC\Bij) > So, (Bl) 
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where Bij is either < NW-kD >ij, the convolution of the wavelet negative annulus NW with 
(possibly cleansed) image data D, or the amplitude in an input background map. p{C\Bij) is the 
probability sampling distribution (PSD) for C given Bi_j. (We are assuming here that the field 
of view is evenly exposed, which is true for the simulations we describe below. In a more general 



constant throughout the wavelet support, and that if we are computing the background, there are 
no source counts in the negative annulus. Since the PSD is dimensionless, it is independent of scale 
size (e.g. the PSD for Bij would be the same if we were to double ax and ay and reduce the count 
rate by a factor of four). 

We replace Bij with the equivalent quantity qij, the expected number of background counts 
within the spatial region spanned by the positive kernel of the wavelet, PW. {qij is related to 
Bij by multiplicative factors.) The PSD does not have analytic form when the total number of 
expected counts within the positive kernel of the wavelet, qij, is small 1); this is demonstrated 
by Damiani et al. (1997a). Like Damiani et al., we use simulations to estimate significances and 
detection thresholds; we carry these simulations out both within the regime they examine and also 
for lower expected counts values. Because this is a computationally intensive problem, we cannot 
accurately estimate significances for each pixel if Sij ^ 10~^. Instead, we compare the value of 
Cij in each pixel with the detection threshold Cij^o{So, Bij), defined by 



Here, we recast the equation using the variable qij, the number of expected counts in PW, as its 
use clarifies our description below of how we estimate Cjj-^o- 

To determine Cij^oiSo,Qi,j), we simulated over 50,000 1024x1024 flat-field images. For each 
image, we: 

• randomly selected values log^o from the range -10 < loggo < 3.25 (we describe why we chose 
this upper limit below); 

• determined the expected background amplitude Bg = in each image pixel {a = A pixels) ; 

• sampled data in each pixel from the appropriate Poisson distribution given rate Bg] 

• computed Cij and qij for each image pixel; 

• and recorded Cij in bins of size A(loggjj) = 0.2, for -6.9 < logg^j- < 3.1, with one bin being 
used for all values of logqij < -6.9.^^ Prom these distributions p{C\qij), we can determine 





Images that had at least one count; empty images were ignored. 
This is of the order of the machine precision. 
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Because there is an inverse correlation between observed values of Cjj- and qij, it is important to 
record piC\qij) and not p{C\qo)- Use of the latter distribution leads to underestimated detection 
thresholds, and thus to larger numbers of false source detections than one would expect, given Sg- 

We determined 25 values of Co{So, q) in each \ogq bin, for values of So ^ 10"''', using the central 
68% (17 values) to estimate "one-cr" errors on Co- We then fit these data with simple functions, 
minimizing the statistic. These functions we use describe the observed detection thresholds 
well, except in the regime logg^ ^ ^ and log5o ^ —4, where we use a look-up table instead. These 
fits allow us in principle to compute detection threshold for significances below our computational 
lower limit So ~ 10~^ (such as for So ~ 10~^, the significance corresponding to one false source 
pixel in an Chandra HRC image). 

In the regime logq'jj ^ and logS'o ^ —4, we compute \ogCo{qi,j) using the function 

logCo(gij) = Aio(loggij)2 + Sio(log%j) + Cio, (B2) 

where 

Ao = 0.00462 
Bxo = 0.0661 

Cio = - 0.0154(log5o)2 - 0.252(log5o) - 0.031 . 



In the regime logg^j ^ 0, we compute Co{qi,j) using the function 

Co{qi,j) = + -Bhi, (B3) 



Am = 



where 

-0.509(log5o) + 1.897 - .00172 (log^^ + 7)3-606 j^g^^ > .7 
-0.509(log5'o) + 1.897 logSo < -7 

5hi = - 1.115(log5o) - 1.038. 

This function is also used by Damiani et al. to fit detection thresholds in this counts regime, though 
their derived coefficients differ from ours. 

For values logg^ j ~ 0, we find that we must use the formula 

\ogCo{qi,j) = ^mid(log%,j) + Smid (B4) 

to compute the detection threshold, with 

-4mid = 0.00182(logS'o)2 + 0.0279(logS'o)2 + 0.158(logS'o) + 0.607 (B5) 

and 

r -0.064(logS'o) + 0.612 - 0.000115(logS'o + 7)^-'^^ -2 < log5o < -1 
^mid = < -0.064(log5o) + 0.612 - 0.00085(log5o + 7)^'^ -7 < logSo < -2 (B6) 
[ -0.064(log5o) + 0.612 \ogSo < -7 

For our simulations, we chose logqo = 3.25 as a upper limit because of the contention of Damiani 
et al. that if logqij ^ 3, the PSD probability sampling distribution is analytically representable as 
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a Gaussian with width o = -^Qij- (Note that the value of qij that we use in this work is larger 
than that used by Damiani et al. by a factor of 27r.) If this is the case, the significance is given by: 

1 



dC exp 
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= 1-1 ^T'' 

2 J'2.'Kqij Jo 




dC exp 



^2 



(B7) 



We find, however, that if we use this formula in the regime logQij ^ 3, the derived values of Cij^o 
are smaller than those predicted by eq. (B3) above. Thus, because it is more conservative, we use 
eq. (B3) to compute detection thresholds for all values logqij ^ 0. 



C. Covariance Estimate: Two-Iteration Background 

If the data are cleansed (see §3.1.3), then an exact calculation of T^[-Bj^j] will include non-zero 
covariance terms. In this section, we derive these terms assuming that the data are cleansed only 
once, i.e. we compute the variance for -82,* j, a background estimate made by convolving the wavelet 
negative annulus with data D2 that are a mixture of raw data D and first-iteration background 
estimates Bi: 

where Nij = Eij/ < NW*E >ij. 
Using eq. (18), 

V j' 

i' i">i' j' j">j' 

To calculate the variance, we must estimate both ^-z] and cov[D'-, j,, D'-„ j„]. The estimate of 

the former is given in cq. (22), derived assuming that the raw datum Diiji is sampled from a Poisson 
distribution with variance -Dj'j', and that each pixel's raw datum is independently sampled. 

Estimation of cov[D'-, j,, D'-„ j„] is more complicated. For the two-iteration background case, 
there are three possibilities: D'-, = D^i and D'-„ ■„ = Din ,//; D'-, ■, = Dii ji and D'-„ •„ = 
Bi,i"j" (or vice-versa); or D'-, j, = and D'-„ j„ = Bi^ii/jn. Assuming that the raw data are 

independently sampled, in the first case the covariance is zero. In the second case, we can estimate 
the covariance using the approximation (Eadie et al. p. 27): 

cov[A'.', = E E E E (^) {^j:^) cov[D,„D,,] 
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- (^) 

d 

^ J 

= V [A' j' ] Ni'' J// NWi"-i' 
= Di' J' Ni" J" NWi" —i' J" —ji . 

In the above equations, represents the expectation vahic of the sampling distribution for D; for a 
Poisson distribution with variance equal to the datum, ji will be equal to the datum. In the third 
case, we find (while skipping some steps): 

= X X Ni',fNWi'_k,j'-lNi",j"NWi''_k,3"-lV[Dk,i] 

k I 

= ^i',j'^^i'-k,j'-lNi",j"^Wi''-k,j"-lDk,l . 

k I 

To demonstrate the effect of including covariance terms, we compute ^[-62] for a 50x50 subfield 
of the ROSAT PSPC Pleiades image, with ax = Cy = 4 pixels. In Figure 23, we show the ratio 
y[i?2]covar/^[-B2]nocovar- We infer the following from this computation: (1) it is quantitatively 
unimportant at the location of sources and within voids, where the change in variance is ^ 1%; 
(2) in the vicinity of a strong sources, the change in variance is ~ 10% (with maximum 30% 
for this particular field, in the vicinity of an 800-count source), and is affected by source crowding; 
and (3) including covariance terms increases the CPU time needed to compute l^[i?2] by a factor 
~ 0{dxdya'^(7y), where dx and dy are the x- and y-axis lengths, respectively, in pixels. 
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Fig. 1 



, — The two-dimensional Marr, or Mexican Hat, wavelet function (eq. 5). 
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(Adjust Correlation Map 
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Compute Source 
Detection Threshold Map 



Select Source 
Pixels 



Output Correlation Map 
Local Maxima that are 
Source Pixels 



Fig. 2. — Flow chart illustrating the source detection algorithm described in §3.1, as carried out at 
selected scales {ax, Cy). Parantheses indicate optional input or computations. The local background 
map may be estimated or provided by the user; a flow chart illustrating how it can be iteratively 
estimated using the input data (and exposure map) is given in Figure 3. 
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Fig. 3. — Flow chart illustrating the iterative local background estimation method described in 
§3.1.3. Note that a background map is output for each selected scale pair {(Jx-,cry). Parantheses 
indicate optional computations. 
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Fig. 4. — Illustration of how source counts may bias a final background estimate (§3.1.3), causing 
"bumps" if the wavelet scale sizes {ax, Oy) ^ rpsF, the characteristic PSF size at a given pixel. We 
use a 50x50 subfield of the ROSAT PSPC Pleiades Cluster image in which rpsF ~ 3 pixels. Top: 
image and surface plot of the background estimate for ax = (Ty = 1 pixel. Middle: same as top, 
but for ax = CTy = 2\/2 pixels. Bottom: same as top, but for ax = ay = 8 pixels. 
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Fig. 5. — Illustration of how source counts may bias an initial background estimate (§3.1.3), causing 
"rings" if sources are located within the negative annulus of the wavelet. We use a 50x50 subfield 
of the ROSAT PSPC Pleiades Cluster image. Top: image and surface plot of the background 
estimate after one iteration (i.e. no "cleansing"). Middle: same as top, but after two iterations. 
Bottom: same at top, but after three iterations. 
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Fig. 6. — Sample power spectra of the Mexican Hat wavelet function (W) and of the negative 
annulus of the Mexican Hat wavelet function (NW). N is the number of pixels in the (padded) 
image. The NW has much larger response to low-frequency components, but this signal would be 
suppressed upon correlation with W (see §3.1.4). 
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Fig. 7. — Flow chart illustrating the source list generation algorithm described in §3.2. Parantheses 
indicate optional input. 
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Fig. 8. — A sample source counts image (§3.2.2), created by first smoothing the data in a 50x50 
subfield of the ROSAT PSPC Pleiades Cluster image with a PW function of size (Tx = cry = 
2 pixels, then subtracting the estimated background. Pixels associated with the strongest local 
maxima comprise the source cells that are used for source property estimation. See Figures 9 and 
10. 
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Fig. 9. — Top: counts data showing an isolated Pleiades Cluster source, as observed by the ROSAT 
PSPC. Middle: source counts image data, created smoothing the counts data with a PW function 
of size Gx = CFy = 2 pixels, then subtracting the estimated background. Bottom: the source cell 
defined using the source counts image data (§3.2.2). This cell, used in the estimation of source 
properties, contains nearly all, if not all, of the counts from this source. 




Fig. 10. — Top: counts data showing two nearly overlapping Pleiades Cluster sources, as observed 
by the ROSAT PSPC Middle: source counts image data, created smoothing the counts data with 
a PW function of size ax = cry = 2 pixels, then subtracting the estimated background. Bottom: 
the source cells defined using the source counts image data. The saddle point seen in the middle 
image defines the boundary between the cells (§3.2.2). 
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Fig. 11. — Top: a 512x512 pixel image (Image A) showing a 1-ksec observation by an idealized 
detector with effective area 1000 cm^, a spatially invariant Gaussian PSF of width apsF = 2.56 
pixels, and an exposure map similar to that of the Einstein IPC (§4.1). Within this image were 
placed 42 point and 4 extended sources. The background is assumed to be locally variable, with 
amplitTidc ^ 10~^ ct sec~^ pix~^. Bottom: same as top, except the observation time is 10 ksec 
(Image B). 
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1 10 100 10 100 1000 

Predicted Counts Predicted Counts 

Fig. 12. — Top: comparison of observed counts, with estimated la errors, and upper limits for 
undetected sources, with predicted counts for the point sources of Image A. Upper limits are 
defined using the source detection threshold values at the correlation maxima nearest the location 
of the undetected sources, and are computed using eq. (6), with cTx = (Jy = VSctg = 4.43 pixels. 
The source exposure time (see Table 2), not the total observation time, is used to compute the 
predicted counts. Bottom: same as top, but for Image B. The two farthest outliers on this figure 
correspond to observed "sources" which are actually composed of two sources each. 



-47- 




Fig. 13. — Top: source cell image generated during the analysis of Image A, for wavelet scale size 
2\/2 pixels. The overlaid small circles and larger ellipses represent respectively the 42 point and 4 
extended sources randomly placed in the image. Bottom: same as top, but for Image B. 
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Fig. 14. — Top: subset of the source-cell image shown at the bottom of Figure 13, showing the 
cells generated in the vicinity of the largest extended source in Image B. Middle: same as top, 
except that now the source cells are defined using a source counts image with smoothing size 
pixels instead of 2\/2 pixels. Bottom: the normalized count-rate image for the extended source (and 

three point sources), generated by creating a source counts image with smoothing size 2 pixels, then 
filtering out all pixels not contained in the source cell shown immediately above. If this source were 
a galaxy cluster, the analyst could proceed to fit to these data (respecting the caveats discussed in 
§4.1). 



Fig. 15. — Top Left: the full iJOS'^T PSPC image of the Pleiades Cluster. Top Right: ellipses repre- 
senting the 148 sources detected by our algorithm. The ellipse sizes are set by deriving the la princi- 
pal axes and rotation angle for each source. Middle Left and Right: same as top left and right, with 
only the central l°xl° portion of the image shown. Bottom Left and Right: same as top left and 
right, with only the central 30 x30 portion of the image shown. NOTE: this is a bitmap image. The 
original image may be downloaded from http://hea-www.harvard.edu/~pfreeinan/fl5.eps.gz. 
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Fig. 16. — Illustration of how the number of iterations used to estimate the local background 
amplitude (§3.1.3) affects the estimation of Pleiades Cluster source properties (§4.2). Left: the 
ratio of source predicted counts C1/C2 given backgrounds estimated using one iteration, and two 
iterations, as a function of C2. Right: same as left, but instead computed for two and three 
iterations. 
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Fig. 17. — Corrected background map generated during the analysis of the ROSAT PSPC image 
of the Pleiades Cluster (§4.2) via the method of §3.2.1. The contour levels are 0, 0.9, 1.05, 1.2, 
1.35, and 1.5 counts, with darker areas having more counts. Since the estimated error in each pixel 
is ~ 0.01 counts, the perceived structure is real and is indicative of X-ray shadowing (Kashyap et 
al. 2001, in preparation). NOTE: this is a bitmap image. The original image may be downloaded 
from http : //hea-www . harvard . edu/~pf reeman/ f 17 . eps . gz. 









Fig. 18. — Top: a simulated 30-ksec Chandra ACIS-I observation of the Lockman Hole. Data in 
all four chips are shown, re-binned by a factor of two for greater visual clarity. The gaps between 
chips are « 15 pixels. Bottom: ellipses representing the 171 sources detected by our algorithm. 
The ellipses, whose sizes are normally set by deriving the la principal axes and rotation angle for 
each source, have minimum axis lengths of 10 pixels for greater visual clarity. 



-53- 



10^ 



1000 - 



100 - 



10 - 



1 - 



T 1 — I I I I I I I 1 1 — I I I I I I I 1 1 — I I I I I 



T — I I I I I I I 1 31 



ACIS-I Simulation [t , = 30 ks] 



_i I 



J I I I I I I I I 1 1 1 1 1 



10 



-17 



10 



-16 



10"^^ 10 

s 



-14 



10 



-13 



Fig. 19. — Differential logA?^— logS" distributions for detected point sources (dotted line) and all point 
sources (solid line) in the Chandra ACIS-I Lockman Hole field. We conclude that our algorithm 

will efficiently detect sources in ACIS-I images with fluxes ^ 10~^^ erg cm~^ sec"-^ (0.5 - 2 keV), 
and has the ability to detect Poisson fluctuations for sources with fluxes ^ 10~^^ erg cm~^ sec~^. 
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Off— Axis Angle (Arcmin) 



Fig. 20. — Offsets of the locations of detected sources from their actual locations within the Chan- 
dra ACIS-I Lockman Hole field, as a function of off-axis angle. These offsets are caused by the 
asymmetry of the Chandra PSF, and the solid line represents its 95% encircled energy radius. Tri- 
angles represent sources with less than 6 counts; crosses, sources with between 6 and 15 counts; 
and stars, sources with more than 15 counts. This figure demonstrates that our ability to associate 
detected sources with actual sources does not vary as a function of source strength. 



Fig. 21. — Illustration of variables used when computing correlation values, x and y are continuous 
variables describing the wavelet function centered in pixel The correlation value for pixel 

is computed by summing the product of the data in pixels and the integral of the 

wavelet function in those pixels. The dotted line shows the boundary between the positive kernel 
PW and negative annulus NW of a Mexican Hat wavelet with ax = (Jy = I pixel, while the solid 
line shows the extent over which the summation is carried out; beyond this line, the amplitude of 
wavelet function is 0. 
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Fig. 22. — Left: discrepancy between the correlation values determined by the analytic method 
and by using the FFT, for the ROSAT PSPC data of the Pleiades cluster. In this example, ax = 
(Ty = 2 pixels. Right: same as left, but comparing the analytic method with the use of the analytic 
Fourier Transform for the MH function. 
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Fig. 23. — Illustration of the effect of including the computation of covariances in the calcu- 
lation of a two- iteration background (see Appendix C for details). Left: ratio of variances 

^[-B2]covar/^[-S2]no covar, for a 50x50 subficld of the ROSAT PSPC Pleiades Cluster image, an- 
alyzed with a wavelet with scale sizes cr^; = fjj^ = 4 pixels. Light regions have values ~ 1, while dark 
regions have values up to « 1.3. Right: the 50x50 subfield of the ROSAT VSVC Pleiades image 
for which the ratio of variances shown at left was computed. Sources in this field correspond with 
light (low-amplitude) regions in the ratio image. 
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Table 1. Source Detection Variance Formulae 



Property 



Variance 



Correlation (Qj) 



Exposure-Corrected 
Correlation {Ccor,i,j) 



< WME^V[Bnorm]) >ij + 

2 Eij < W^i.{EV[Bnorm]) >ij + 

< t^2^y[5„orm]) >i,j 



Approximate 
Exposure-Corrected 
Correlation (C^^) 



Normalized Background {Bnorm,i,j) < NW\Dn >i,j /(< NW-kE >i,jf 



Note. — i and j are pixel indices, and N is the number of iterations used 
to compute the background map, if it is not provided. 
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Table 2. Source Property Expectation Values 



Property 



Units 



Expectation Value 



X Location 
Y Location 
Bkg Counts 
Counts 
Exposure 
Count Rate 



pix 
pix 

ct 
ct 

varies 



5. 



Ds-Bs 



ct / exp unit Rg^c = Cs/tg 



Bkg Count Rate ct / exp unit Rg^B = Bg/tg 



Note. — i and j are pixel indices; sc denotes those image pixels which 
lie within the source's cell; Dg = Yliiesc^jesc-^iJ'-' ^'^'^ *o total 
exposure, in the same units as the exposure map. 
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Table 3. Source Property Variance Formulae 



Property 


Units 


Variance 


X Location (xg) 


pix 


i^/Dl) Eie.c E,e.c - Xgf 


Y Location [ys) 


pix 




Bkg Counts (Bg) 


ct 


\ — ^ \ — ^ IT'S T/^r 1 
ij L norm,i,jJ 


Counts (Cg) 


ct 


Dg + V[Bg] 


Exposure (tg) 


sec 


iVDl) E,e.c E,e.c {toE.^i - tgf 


Count Rate {Rg,c) 


ct / exp unit 


{l/tl){V[Cg]+RlcV[tg]) 


Bkg Count Rate {Rg,B) 


ct / exp unit 


{l/tl){V[Bg]+Rl^V[tg\) 



Note. — i and j are pixel indices; sc denotes those image pixels which lie within 
the source's cell; Dg = Eiesc Ejesc Aj; and to is the total exposure, in the same 
units as the exposure map. V[B'^^^^ ^^j] is defined in eq. (25). 



Table 4. Number of Detected Sources: ROSAT PSPC Pleiades Image 



Scale Exposure Number of 

Test Iterations Significance Separation Correction Sources 



1 


2 


10-6 


2 


Fast (eq. 17) 


129 


2 


2 


10-6 


V2 


Fast 


136 


3 


3 


10-6 


2 


Fast 


130 


4 


3 


10-6 


V2 


Fast 


137 


5 


3 


10-6 


2 


Full (eq. 16) 


132 


6 


3 


10-6 


V2 


Full 


138 


7 


2 


10-5 


2 


Fast 


144 


8 


2 


10-5 


V2 


Fast 


148 



Note. — Scale sizes range from cr = 1 to o" = 16 pixels. The threshold 
significance for source cleansing is So = 
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Table 5. Comparison Among Source Detection Algorithms: ROSAT PSPC Pleiades Image 



Method Number of Detected Sources 

Our Algorithm (Test 8) 148 

Damiani et al. (1997b) 150 

Micela et al. (1996) 99 

WGACAT 127 

MPE 100 

Common To: 

All Methods 81 

Our Algorithm & WGACAT 27 

Our Algorithm & MPE 12 
WGACAT k MPE 3 

Our Algorithm Only 28 

WGACAT Only 16 
MPE Only 4 



